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This paper is an extended abstract of an analysis of term rewriting where the terms in the rewrite rules 
as well as the term to be rewritten are compressed by a singleton tree grammar (STG). This form of 
compression is more general than node sharing or representing terms as dags since also partial trees 
(contexts) can be shared in the compression. In the first part efficient but complex algorithms for 
detecting applicability of a rewrite rule under STG-compression are constructed and analyzed. The 
second part applies these results to term rewriting sequences. 

The main result for submatching is that finding a redex of a left-linear rule can be performed in 
polynomial time under STG-compression. 

The main implications for rewriting and (single-position or parallel) rewriting steps are: (i) un- 
der STG-compression, n rewriting steps can be performed in nondeterministic polynomial time, (ii) 
under STG-compression and for left-linear rewrite rules a sequence of n rewriting steps can be per- 
formed in polynomial time, and (iii) for compressed rewrite rules where the left hand sides are either 
DAG-compressed or ground and STG-compressed, and an STG-compressed target term, n rewriting 
steps can be performed in polynomial time. 

1 Introduction 

An important concept in various areas of computer science like automated deduction, first order logic, 
term rewriting, type checking, are terms (ranked trees), and also terms containing variables (see e.g. Q). 
The basic and widely used algorithms in these areas are matching, unification, term rewriting, equational 
deduction, asf. For example, a term f(g(a,b),c) may be rewritten into f(g(b,a),c) by the commutativity 
axiom g(x,y) = g(y,x) for g. Since implemented systems often deal with large terms, perhaps generated 
ones, it is of high interest to look for compression mechanisms for terms, and consequently, also inves- 
tigate variants of the known algorithms that also perform efficiently on the compressed terms without 
prior decompression. 

The device of straight line programs (SLP) for compression of strings is a general one and allows anal- 
yses of correctness and complexity of algorithms ET1 [161 . SLPs are polynomially equivalent to the 
LZ77-variant of Lempel-Ziv compression ll25l . SLPs are non-cyclic context free grammars (CFGs), 
where every nonterminal has exactly one production in the CFG, such that any nonterminal represents 
exactly one string. Basic algorithms are the equality check of two compressed strings, which requires 
polynomial time lfl9l (see lfl"5l for an efficient version and [ 1 1] for a proposal of a further improvement), 
and the compressed pattern match, i.e., given two SLP-compressed strings s,t, the question whether s is 
a substring of t can also be solved in polynomial time in the size of the SLPs. 

A generalization of SLPs for the compression of terms are singleton tree grammars (STG) E2l[T3l|7l, a 
specialization of straight line context free tree grammars ll4ll5l [l~7l[T8l . where linear SLCF tree grammars 
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are polynomially equivalent to STGs |[T7l|T8]]. Basic notions for tree grammars and tree automata can be 
found in O. Besides using the well-known node sharing, also partial subtrees (contexts) can be shared 
in the compression. The Plandowski-Lifshits equality test of nonterminals can be generalized to STGs 
and requires polynomial time El|22l in the size of the STG. 

A naive generalization of the pattern match is to find a compressed ground term in another compressed 
ground term, which can be solved by translating this problem into a pattern match of compressed preorder 
traversals of the terms. A generalization of the pattern match is the following submatching problem (also 
called encompassment): given two (STG-compressed) terms s,t, where s may contain variables, is there 
an occurrence of an instance of s in tl A special case is matching, where the question is whether there is 
a substitution a, such that o{s) = t, which is shown to be in PTIME in |7J[8), including the computation 
of the (unique) compressed substitution. 

In this extended abstract (of 11231 ) we report informally on progress in finding algorithms operating on 
STGs for answering the submatching question, and which only operate on the STGs. We show that if 



s is STG-compressed and linear, then submatching can be solved in polynomial time (Theorem 3.7 1. If 



s is ground and compressed or s is DAG-compressed, we describe less complex algorithms that solve 



the submatching question in polynomial time (Theorem 4.1 and Theorem |4.3|). In the general case, we 



describe a non-deterministic alg orith m that runs in polynomial time. The deterministic algorithm runs 



in time 0(n c l FVi "" fc (*)l) (Theorem 4.4 1, where n is the size of the STG and FVmult(s) the set of variables 
occurring more than once in s. This is an exponential-time algorithm, but in a well-behaved parameter. 
As an application and an easy consequence of the submatching algorithms, a (single -position or paral- 
lel) deduction step on compressed terms by a compressed left-linear rewriting rule can be performed in 
polynomial time. We also show that a sequence of n rewrites with a STG-compressed left-linear term 
rewriting system on an STG-compressed target term can be performed in polynomial time (see Theo- 
57Tb. Our result confirms results on complexity of rewrite derivations under DAG-compression ITU . 
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namely that rewrite systems with a polynomial runtime complexity can be implemented such that the 
algorithm requires polynomial time. 

Example 1.1 Consider the term rewriting rule f{x) — > g(x,b), and let the term t\ = f(f(f(a))) be 
compressed as C\ — > /(•), C2 — > C\C\, T — > C2(T'),T' — > f(a). A single term rewriting step on the 
compressed term t\ by the rule f{x) — > g(x,b) would produce T' — > g(a,b), and hence the reduced 
and decompressed term is f(f(g(a,b))). Other rewriting steps on the compressed term that do not 
decompress the term have to analyze the contexts. Let another term be ?2 = / \ a )> compressed as 
C\ — > /(•), C2 — > C\C\, C3 — > C2C2, C4 — > C3C3, C5 — > C4C4, T — > Cs(a). A term rewriting step on 
T using f(x) —7- g(x,b) may rewrite the context /(•) and thus would produce C\ — > g(-,b), and hence 
reduces the term in one blow to g(.--, (g(. ..,b).. .),b), which is a parallel rewriting step, see Section^ 

The structure of this extended abstract (of [23]) is as follows. First the basic notions, in particular STGs, 
are introduced in Section [2] An algorithm for linear submatching is explained in Section [3] In Section [4] 
we explain submatching for some special cases and also a general non-deterministic algorithm for term 
submatching of compressed patterns and terms. Finally, in Section[5j we illustrate the application in term 
rewriting and argue that n rewrites for a left-linear TRS can be performed in polynomial time. 



2 Preliminaries 



We will use standard notation for signatures, terms, positions, and substitutions (see e.g. O). A position 
is a word over positive integers. For two positions p\,p%, we write p\ < p%, if p\ is a prefix of P2, and 
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Pi < P2, if Pi is a proper prefix of p2- We call two strings W\,W2 compatible, if w\ is a prefix of w%, 
or W2 is a prefix of wi. We write p[i] for the symbol of p, where is the start index, and p[i,j) for 
the substring of p starting at i ending at j. The set of free variables in a term t is denoted as FV(t). Let 
FVmult{s) be the set of variables occurring more than once in s. Terms without occurrences of variables 
are called ground. A term where every variable occurs at most once is called linear. A context is a 
term with a single hole, denoted as [•]. Sometimes it is convenient to view a linear term containing one 
variable as a context, where the single variable represents the hole. As a generalization, a multicontext 
is a linear term, where the variable occurrences are also called holes. Let holep(c) be the position (as a 
string of numbers) of a hole in a context c, and let the hole depth be the length of holep(c). If c = c\ [02] 
for contexts c,ci,C2, then c\ is a prefix context of c and C2 is a suffix context of c. The notation c[s] 
means the term constructed from the context c by replacing the hole with s. An rc-fold iteration of a 
context c is denoted as c"; for example c 3 is c[c[c]]. A substitution a is a mapping on variables, extended 
homomorphically to terms by o(f(t\,. . .,t n )) = f(o(t\), . . . ,cx(f„)). 

Definition 2.1 A term rewriting system (TRS) R is a finite set of pairs {(/,•, ri) \ i = l,...,n}, called 
rewrite rules, written {/,• — > r,}, where we assume that for all i : is not a variable, and FV{r{) C FV(li). 

A term rewriting step by R is t — > t', if for some i: t = c[<7 (/,-)] and t' = c[a(r,)]/or some context c and 
some substitution a. 

2.1 Tree Grammars for Compression 

First we introduce string compression: A straight line program (SLP) is a context-free grammar that 
generates one word, has no cycles, and for every nonterminal A there is exactly one production of the 
form A — > A1A2 or A — > a. 

An application for SLPs is the representation of compressed positions in compressed terms. We will use 
the well-known (polynomial-time) algorithms, constructions and their complexities on SLPs like equality 
check of compressed strings, computing prefixes, suffixes, the common prefix (suffix) of two strings (see 
EI]|9l[l9l|20l[l2l[l5l[ll). 

We consider compression of terms using tree grammars: 

Definition 2.2 A singleton tree grammar (STG) is a 4-tuple G = (TM \CN ,L,1Z), where TM are 
tree/term nonterminals ofarity 0, CM are context nonterminals ofarity 1, and £ is a signature of function 
symbols (the terminals), such that the sets TM, CM, and £ are finite and pairwise disjoint. The set of 
nonterminals M is defined as M = TMuCM. The productions in 1Z must be of the form: 

• A —7- f{A\ , . . . ,A m ), where A, A, £ TM, and f £T,is an m-ary terminal symbol. 

• A—> C\A 2 where A,A 2 £ TM, and Ci £ CM. 

• C -> [•] where C £ CM. 

• C — > C1C2, where C, C\ , C2 £ CM. 

• C f(Ai, . . . ,Ai-i, , • • • Am), where A u ... ,Ai-i Ai+l Am 6 TM, C £ CM, and f £L 
is an m-ary terminal symbol. 

• A — > A\ (X-production), where A andA\ are term nonterminals. 

Let N{ >g N2 far two nonterminals Ni,N2, iff (Ni — >■ t) £ R, and N2 occurs in t. The STG must be non- 
cyclic, i.e. the transitive closure >J must be irreflexive. Furthermore, for every nonterminal N ofG there 
is exactly one production having N as left-hand side. Given a term t with occurrences of nonterminals, 
the derivation oftbyG is an exhaustive iterated replacement of the nonterminals by the corresponding 
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right-hand sides. The result is denoted as vala(t). We will write val(t) when G is clear from the context. 
In the case of a nonterminal N of G, we also say that N (or G) generates voIq(N) or compresses valo(N). 
The depth of a nonterminal N is the maximal number of >Q-steps starting from N, and the depth ofG is 
the maximal depth of all its nonterminals. The size of an STG is the number of its productions, denoted 
as \G\. 

Definition 2.3 Let G be an STG and V be a set of variables. Then (G,V) is an STG with variables, 
where additional production forms are permitted: 

• A —7- x, where A £ TM and x£V. 

• x — > A (X -production), where x G V and A £ TM . 

This means that variables may be terminals or nonterminals, depending on the existing productions. The 
measure Vdepth(N ,V) is defined as the maximal number of>c-steps starting from N until an element of 
V or a terminal is reached, and Vdepth(G, V) the maximum. 
In the following we always mean STG with variables if variables are present. 

An STG G is called a DAG, if there are no context nonterminals. □ 

The compression rate may be exponential in the best case, but not larger: The size of terms represented 
with an STG G is at most 0(2^ ). Note that the term depth of DAG-compressed terms is at most the 
size of the DAG, whereas the term depth of STG-compressed terms may be exponential in the size of 
the STG Note also that every subterm in a DAG-compressed term is represented by a nonterminal, 
whereas in STG-compressed terms, there may be subterms that are only implicitly represented. It is 
known that several computations in SLPs and STG, for example length computations, can be done in 
polynomial time. Several forms of extensions of STGs are well-behaved, such that even a sequence of n 
such extensions will lead to only polynomial size growth. 

Compressed Matching. The investigation in [7] shows that (exact) term matching, also in the fully 
compressed version including the computation of a compressed substitution, is polynomial. I.e. given 
two nonterminals S, T, where S may contain variables, there is a polynomial time algorithm for answering 
the question whether there is some substitution a such that a(val(S)) = val(T), and also for computing 
the substitution, where the representation is a list of variable-nonterminal pairs, and the nonterminals 
belong to an extension of the input STG. 

Compressed Submatching. Given two first-order terms s,t, where s (the pattern) may contain variables, 
the submatching problem is to identify an instance of s as a subterm of t. Submatching (also called 
encompassment relation) is a prerequisite for term rewriting. 

Definition 2.4 The compressed term submatching problem is: 

Assume given a term s which may contain variables, and a (ground) term t, both compressed with an STG 
G = Gs U Gt, such that val(T) = t and val(S) = sfor term nonterminals S £ Gs, T £ Gj- The task is to 
compute a (compressed) substitution o such that o(s) is a subterm oft; also the (compressed) position 
(all positions) p of the match in t should be computed. Specializations are:uncompressed if s is given as 
a plain term without any compression; ground if s is ground; DAG-compressed, if s is DAG-compressed; 
and linear, if s is a linear term, i.e. every variable occurs at most once in s. 

Lemma 2.5 Given an STG G, a term s and a nonterminal T, with valc(T) = t, where t is ground. If 
there is some substitution <J, such that <j(s) is a subterm oft, then there are the following possibilities: 

1. There is a term nonterminal B ofG such that valc(B) = o(s). 
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(a) non-compatible overlap (b) parallel (c) sequential 

Subfigures (b) and (c) only show the hole path of two occurrences of the context c. 

Figure 1 : Non-compatible, parallel and sequential overlap of c with d 

2. There is a production B —> CB' in G, such that o(s) = c[vclIq(B')], where c is a nontrivial suffix 
context ofval(j(C). There are subcases for the hole position p of c. 

(a) (overlap case) p is a position in s. 

(b) p = p\P2, where p\ is the maximal prefix of p that is also a position in s. Then s\ pi = x is 
a variable. The algorithms below have to distinguish the subterm case where x occurs more 
than once in s and the subcontext case where x occurs exactly once in s. 

3 Term Submatching with Linear Terms 

Overlaps of Linear Terms and Contexts. An important concept and technique used is periodicity of 
contexts. This is a generalization of periodicity of strings: for example the string "bcabcabc" is periodic 
with period length 3. A context c is called periodic if c = d"d' for some contexts d,d' and a positive 
integer n, where d' is a prefix of d. This is even generalized to multicontexts c (linear terms, where the 
variables are the holes), and where periodicity means that c can be overlapped with itself at periodic 
positions without conflicts. 

We consider overlapping multicontexts c,ci,C2, ■■■ and a context d. In particular special variants of 
overlaps have to be analyzed: Overlaps where the hole of d is not compatible with any hole of c. The 
overlaps where a hole of c is compatible with a hole of d can be dealt with generalizing results from 
words (or words with character-holes). If there are non-compatible overlaps of copies of c with d, then 
only two configurations are possible: parallel and sequential (see Proposition |3 . 2 1 and Fig. [TJ, and there 
are no mixed configurations. Thus, periodicities in linear terms are not only possible along the hole-path 
of d but also along other paths, and there are two different kinds of such periodicities: the parallel and 
the sequential variant. A helpful technical result is a periodicity theorem that tells us that a multi-context 
c is periodic, if there is a multiple overlap of h + 2 copies of c where h is the number of holes, and the 
overlap is sufficiently dense. This will be used in the submatching algorithm for linear terms. 

Example 3.1 Let d = f(a\, /([•], a\)) and let c = /(fli, [•]). Then c overlaps d at position s, which is a 
compatible overlap, since the start as well as the hole position of c is on the hole path of d. The overlap 
of c with d at position 1 (in d) is a non-compatible overlap, since the hole of c is at 2.2, which is not a 
prefix or suffix of the hole path of d, which is 2.1. 
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Proposition 3.2 Let c be a multicontext with at least one hole, and let d be a context with exactly one 
hole, and let p\ < P2 be two positions of non-compatible overlaps of c in d. Let qi be the maximal 
common hole path (mchp) of c at pifor i = 1,2. Then there are the following two cases (see Figure^: 

1. q\ = q2 (the parallel overlap case). Then for p' such that p\p' = p2 the path p\ (p') n is compatible 
with holep(d) for all n. Also, this is a multiple overlap of c' with itself at positions (p')\ where c' 
is constructed from c with an extra hole at p", where pip" = holep(d). 

2. qj < qi (the sequential overlap case). Then piqi = Piqi- I.e., there is a fixed position on the hole 
path of d, where the hole paths of occurrences of c deviate. 

Example 3.3 Let d = f(f(a h a 2 ), [■]) be a context, c = f(f(x,y),(c') m [.]), and let d = (c') m [-]. 
Then there is an overlap of c with d at positions £,2,2.2,.... It is an overlap of the first kind, 
i.e. a parallel overlap. A sequential overlap is the following: Let c = f(a\,f(a\,f(a\, [•]))) and let 
d = /(ai,/(ai,/(ai, /([•], /(ai,/(ai,ai)))))). Then the overlap positions are £,2,2.2,2.2.2. 

Theorem 3.4 (Periodicity- Theorem) Let c be a multi-context with h>\ holes. Let p be the position of 
a fixed hole of c, and let pi,i = 1 , . . . ,n be prefixes of p such that i < j implies pi < pj with n >h + 2. 
Assume that there is a (right-cut) overlap ofn copies of c starting at position pi such that p is a prefix of 
Pip, i.e., the hole position ofc starting at pi is compatible with pfor all i, and only positions in c at p\ are 
relevant for the overlap. Let p max be max{ \ — \pi\ \ i = 1 , . . . , n — 1}. Assume \p\ — \p n \>2h- p max ; 
this means there are 2h ■ p max common positions on the path p of all occurrences ofc. 
Then the multicontext c is periodic (in the direction p), and a period length is p a u : = 
gcd(\p2\ — | 1 1 , | /?3 1 — |/72 1 , ■ ■ ■ , \Pn\ — \Pn-\\)- Moreover, the overlap is consistent with using the same 
substitution for the variables for every occurrence ofc. 

Tabling Prefixes of Multicontexts in Contexts. 

The core of the algorithm for finding submatches of a linear term s in other terms (under STG- 
compression) is the construction of a table in dynamic -programming style. The table contains overlaps 
of s with contexts that are explicitly represented in the STG G by a context nonterminal. In fact the table 
is split into several tables: There is a table per context nonterminal A of G and per variable (hole) of s 
for the compatible overlaps. In addition there is an extra table for non-compatible overlaps. This makes 
h + 1 tables where h is the number of variables of s. 

The entries in the tables are pairs of a position and a substitution necessary for the overlap. Since terms of 
exponential size and depth may be represented in the STG G, a compact representation of a large number 
of entries is necessary in order to keep the tables of polynomial size. Indeed this is possible exploiting 
periodicity. If the number of entries in a table are sufficiently dense, then the periodicity theorem implies 
that a large subset of the entries enjoys regularities, and a series of periodic overlaps can be represented 
in one entry, consisting of: a start position, a period (a position, respectively a context nonterminal), and 
the number of successive entries. 

In more detail, the construction of the prefix tables is bottom-up w.r.t. the grammar where the produc- 
tions A — > A\A2 for context nonterminals permit to construct the A-tables from the Ai,A2-tables, and 
where the start are the contexts with hole-depth 1. This construction must take into account the compact 
representation of the entries: single ones and periodic ones, which makes the description of the algorithm 
rather complex due to lots of cases. The construction of the prefix table in the case A — > A\A2 and the 
periodic cases is depicted in Figure [2] where (a) shows the case where A has a periodic suffix, (b) shows 
the case where A has an inner part that is periodic, (c) shows a case where the periodicity goes into a 
direction that is not compatible with the hole of Ai, which leads to the sequential overlap case; and (d) is 
a case of a sequential overlap already in the table for A\. The generation of the periodic entries is done in 
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an extra step: compaction, where the periodic overlaps are detected by searching for sufficiently dense 
entries. This is the only place where periodic entries are generated. 

In addition to the prefix tables there is a result table, which contains the detected submatchings, and 
which is maintained during construction of the prefix tables. 

Since it is necessary to also have submatchings in terms, i.e. for term nonterminals, we keep things simple 
and assume that every production for a term nonterminal is of the form A — > CA\, where A\ is a term 
nonterminal with production Ai — > a, i.e. a constant. This rearrangement of G can be done efficiently, 
and thus does not restrict generality. For these nonterminals the extraction of the submatchings can be 
done using the already constructed prefix-tables. 

Note that during construction of the tables, the STG G may have to be extended in every step. 
Example 3.5 We describe several small examples for compatible entries in a prefix table. Therefore we 



slightly extend Example 3.3 Let the STG be S — )• A;A — > A\Ai ;A\ — > A2Ai,Ai — > f(a\ , [•]). 

1. Then (C,A2,°°) for C —> [■] is a potential entry in a result table for A. 

2. Let Ai, —> g {[•]), B —> A^AjC' — > A4. Then (C',A2,°°) is an entry in the result table for B. 

3. Let B' — > BA4, then (A4, A2, 2) is a potential entry in the result table for B 1 . 

4. The tuple (A4,A2, 3) is an entry in the prefix table for B. 

5. Let B" — > AgA4, Ag — > A4A1. The context A& is then a potential entry in the result and prefix tables 
ofB". 

Note that item^cannot be used as a result, since composing B as in B' — > BA4 in item^ may render an 
overlap invalid. 

Example 3.6 We describe an example for a non-compatible entry in a prefix table. Therefore we 



slightly modify Example 3.3 Assume there is an STG G. Let c = f (a\ ,f(a\ ,f(a\ ,f(a\ ,[■])))), 
d = f(a\,f[a\,f(a\) /([•], /(ai,/(fli,fli)))))), and let P,D,Co,S be a nonterminals such that val(P) = 
f(ai, [■]), val(D) = d,val(S) = c, voI(Cq) = [■]. Then an entry in the non-compatible prefix table for D 
could be (Co,P,3). 

Theorem 3.7 (Linear Submatching) Let G be an STG, and S, T be two term nonterminals such that 
val(S) is a linear term, and the submatching positions ofval{S) in val{T) are to be determined. Then 
the algorithm for linear submatchings computes an 0(\G\ 5 )-sized representation of all submatchings of 
val(S) in val(T) in polynomial time dependent on the size ofG. 



4 Submatching Algorithms for Other Cases 

We consider several specialized situations: ground terms, uncompressed patterns, DAG-compressed 
terms, and also non-linear terms. 

4.1 Ground Term Submatching 

If s is ground and compressed by a nonterminal S then submatching can be solved in polynomial time 
by translating both compressed terms into their compressed preorder traversals (i.e. strings) J4j [5) and 
then applying string pattern matching ETl [TBI . The string matching algorithm in |fT5l [TTll computes a 
polynomial representation of all occurrences. Note that in our case, the structure of ground terms is 
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(a) (b) (c) (d) 

Figure 2: Cases in the construction of the prefix tables for periodic entries 

very special as a string matching problem: periodic overlaps of the preorder traversal as strings are not 
possible. Thus the complete output of the algorithm is as follows: (i) a list of term nonterminals N of 
the input STG G, where val(o(S)) = val(N), and (ii) a list of pairs (N, p), where the production for N is 
of the form ,/V — > CN', p is a compressed position, and val(C)\ val ( p ^ [val(N')] = val(S). Moreover, every 
nonterminal N appears at most once in the list. 

The required time for string matching is 0(n 2 m) where n is the size of the SLP of T and m is the size of 
the SLP of 5. Since the preorder traversal can be computed in linear time (see [8]), we have: 

Theorem 4.1 The ground compressed term submatching can be computed in time 0(|G:r| 2 |(7s|), and the 
output is a list of linear size. 

4.2 DAG-Compressed Non-Linear Submatching 

Now we look for the case of DAG-compressed s, which is slightly more general than the uncompressed 
case, and where variables may occur several times in s. Also for this case, there is an algorithm for 
submatching that requires polynomial time. The algorithm outputs enough information to determine all 
the positions and substitutions of a submatch. 

Example 4.2 The number of possible substitutions for a submatch in a DAG-compressed term may 
be exponential: Let the productions be S —> f(x,y), and T — > /(Ai,Ai) Ai — > /(A2A2), . . . ,A«-1 — > 
f(A n ,A n ),A n — > a. Then val(T) is a complete binary tree of depth n and there is a submatch at every 
non-leaf node. Clearly, it is sufficient to have allAj as submatchings in the output, which is of linear size. 

In the case of a DAG-compressed or uncompressed pattern-term (not necessarily linear) s and STG- 
compressed target term t, the algorithm for computing all submatchings is designed in dynamic program- 
ming style. It constructs a table of possible submatchings of s in the context nonterminals corresponding 
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Figure 3: Cases in the construction of the s-in-C-table for DAG-compression 



to t. The key of the table is (C,p), where C is a context nonterminal, and p a position that is a suffix 
of vaI(C) as well as a position in s. The number of these positions is linear in \G S \ + \G t \ for every 
context. The entries are substitutions into the variables of s, i.e. a list of pairs (xt,Ai), where A, is a term 
nonterminal representing a ground term. There is also a result list of found submatchings in contexts 
C contributing to T, and term nonterminals for ground terms that are instances of s. The construction 
proceeds again bottom-up in the STG G t for context nonterminals, and for A — > A1A2, constructs the 
table for A from the tables for Ai,A2, and in case a full submatching is found, inserts a result into the 
result list. 

Finally, from these information, a representation of all submatchings can be constructed by looking at 
the right hand sides of the productions A — > CB for term nonterminals, and using the table entries for C, 
and also constructing the occurrences of the ground terms. 

Theorem 4.3 Let G be an STG, and S, T be two term nonterminals such that S is DAG-compressed. 
Then the submatch computation problem can be solved in polynomial time. Also an explicit polynomial 
representation of all matching possibilities can be computed in polynomial time. 

4.3 A Non-Deterministic Algorithm for Sub-Matching in the General Case 

The submatching problem for STG-compressed pattern terms that may be nonlinear can be solved by a 
relatively easy search that leads to a non-deterministic polynomial time algorithm: Given S, with non- 
linear s = val(S), extract and construct a nonterminal B representing a subterm /(n,. . . ,r re ) of s such 
that two terms r,, rj contain a common variable. Then non-deterministic ally choose a right hand side 
r of a production of G t of the form /(. . .), then compute the usual match of B with r using [7] which 
will produce an instantiation of at least one variable of val(B), and hence of s. Then iterate this until all 
variables with double occurrences are instantiated. For the resulting linear term we know how to find all 
matching positions. 

Theorem 4.4 (Nondeterministic General Submatch) Let G be an STG and S, T be two nonterminals 
of G where val(S) may contain variables. Then the algorithm for fully compressed submatching for 
compressed terms s,t requires at most searching in \G\^ FVmult ^ alternatives for the substitution and the 
computation for one alternative can be done in polynomial time. Thus the submatching problem is in NP. 



There remains a gap in the knowledge of the complexity of the fully compressed submatching problem 
for terms, which for the decision problem is between PTIME and NP. 
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Remark 4.5 The non-linear submatching problem can be computed in polynomial time if there are few 
variable occurrences (< \G\) in s: First linearize s, then use the linear compressed submatch and then 
perform a postprocessing checking equality enforced by the variables of s. 



5 Polynomial Compressed Term Rewriting 

For our compressed representation the natural approach to rewriting is to use parallel rewriting of the 
same subterm at several positions and by the same rewriting rule. Note, however, that the set of redexes 
that are rewritten in parallel will depend on the structure of the STG G t , and not on the structure of the 
rewritten term t. 

Let R be a compressed TRS, let t be a ground term with valc{T) = t, let R be compressed by the STG 
Gr as {L{ —> Ri | i = 1 , . . . , n} where Lj,Ri are term nonterminals. 
A (parallel) term rewriting step is performed as follows: 

First select L, — > /?, as the rule. There is an oracle, which is one of our submatching algorithms applied 
to L{, for finding the redex for val(Lj) or the set of redexes that provides the following: 

1. An extension G' of G, i.e. additional nonterminals and productions. 

2. A substitution a as a list of pairs: {x\ \-¥ Ai, . . . ,x m h-> A m }, where FV{val{Li)) = {x\ ,x m }, A,- 
are term nonterminals in G', and va/(A,) is a subterm of t. It is also assumed that the instantiation 
is integrated in the grammar G' as productions x, — > Ai for i = 1 , . . . , m. 

3. A term nonterminal A (corresponding to L,) in G' which contributes to val(T), and a compressed 
position p. 

Then the rewriting step is performed by modifying the grammar such that somewhere in the part of the 
grammar contributing to t: Lj is replaced by /?,-. This will also generate an extension of G t on the fly and 
also a copy of the STG Gr is made. 

A single-position rewriting step under STG-compression is performed in a similar way. 

Theorem 5.1 Let R be a TRS compressed with Gr and t be a term compressed with an STG G. 
Then a sequence of n term rewriting steps where submatching is a non-deterministic oracle that is 
not counted, can be performed in polynomial time. The size increase by n term rewriting steps is 
0(\GR\ 2 n 1 {\G\ 2 + \G\{\ogn + 2\G R \) + {\ogn + \G R \) 2 )). 

The complexity bound is 0(n 7 log 2 (n)) depending on the number n of rewrites; 0(|Go| 2 ) depending on 
the size of Gt', and 0(|G#| 4 ) depending on the size of Gr. Note that the degree of the polynomial for the 
estimation of the worst case running time is worse than the space bound. The term rewriting sequence 
has to be constructed (+ 1) and Plandowski equality check has to be used in every construction step, 
which contributes a factor of 3 in the exponent. But note that there are faster deterministic tests ||T5llTTTl 
and even faster randomized equality checks lfT0l l3ll24"l. 

Single-position rewriting requires a partial decompression of the redex position (similar to the parallel), 
which leads to an extra increase in the size of the STG, but to the same, still polynomial, complexity. 
Combining the results on submatching and sequences of rewriting, we obtain the following corollaries: 

Corollary 5.2 Let R be an STG-compressed TRS and t be an STG-compressed term. Then a sequence 



of n term rewriting steps using the submatching algorithm in Subsection 4.3 can be performed in non- 
deterministic polynomial time. 



Proof. This follows from Theorems 5.1 and 4.4 □ 



M. Schmidt-Schauss 
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Corollary 5.3 Let R be a left-linear STG-compressed TRS and t be an STG-compressed term. Then n 



term rewriting steps where the submatching algorithms in Subsection 4.3 are used can be performed in 
polynomial time. 



Proof. This follows from Theorems 5.1 and 3.7 



□ 



Corollary 5.4 Let R be a TRS with DAG-compressed left-hand sides and STG-compressed right hand 
sides and let t be an STG-compressed term. Then n term rewriting steps where the submatching algorithm 



in Subsection 4.2 is used can be performed in polynomial time in n. 



Proof. This follows from Theorems 5.1 and 4.3 



□ 



Corollary 5.5 Let R be an STG-compressed TRS and t be an STG-compressed term, such that the left 
hand sides of every rule has at most \G\ occurrences of variables. Then n term rewriting steps (see 



Remark 4.5 1 can be performed in polynomial time in n. 



6 Conclusion 

We have constructed several polynomial algorithms for finding a submatch under STG-compression, or 
restrictions thereof. It is also shown that n rewrite steps can be performed in polynomial time under 
STG-compression in several cases: left-linear and STG-compressed TRS, DAG-compressed or ground 
left hand sides of rules. Also in the general case of non-linear left hand sides n rewrites can be performed 
non-deterministically in polynomial time, where a search for a redex is required. This is connected to 
the open problem of the exact complexity of computing submatches also for non-linear terms. 
A connection to the results in 0]] on polynomial runtime complexity is that our results also imply that 
for TRSs with polynomial runtime complexity the (single-position and parallel) rewriting can be imple- 
mented such that n rewrite steps can be performed in polynomial time. 

A remaining open question is whether the general STG-compressed submatching (of nonlinear terms s 
in t) can be solved in polynomial time or not. 
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